Multimodal Integration for Meeting Group Action Segmentation and Recognition
نویسندگان
چکیده
We address the problem of segmentation and recognition of sequences of multimodal human interactions in meetings. These interactions can be seen as a rough structure of a meeting, and can be used either as input for a meeting browser or as a first step towards a higher semantic analysis of the meeting. A common lexicon of multimodal group meeting actions, a shared meeting data set, and a common evaluation procedure enable us to compare the different approaches. We compare three different multimodal feature sets and four modelling infrastructures: a higher semantic feature approach, multi-layer HMMs, a multistream DBN, as well as a multi-stream mixed-state DBN for disturbed data.
منابع مشابه
Automatic recognition of multiparty human interactions using dynamic Bayesian networks
Relating statistical machine learning approaches to the automatic analysis of multiparty communicative events, such as meetings, is an ambitious research area. We have investigated automatic meeting segmentation both in terms of “Meeting Actions” and “Dialogue Acts”. Dialogue acts model the discourse structure at a fine grained level highlighting individual speaker intentions. Group meeting act...
متن کاملMultimodal Meeting Tracker
Face-to-face meetings usually encompass several modalities including speech, gesture, handwriting, and person identification. Recognition and integration of each of these modalities is important to create an accurate record of a meeting. However, each of these modalities presents recognition difficulties. Speech recognition must be speaker and domain independent, have low word error rates, and ...
متن کاملTowards a Multimodal Meeting Record
Face-to-face meetings usually encompass several modalities including speech, gesture, handwriting, and person identification. Recognition and integration of each of these modalities is important to create an accurate record of a meeting. However, each of these modalities presents recognition difficulties. Speech recognition must be speaker and domain independent, have low word error rates, and ...
متن کاملAutomatic analysis of multiparty meetings
This paper is about the recognition and interpretation of multiparty meetings captured as audio, video and other signals. This is a challenging task since the meetings consist of spontaneous and conversational interactions between a number of participants: it is a multimodal, multiparty, multistream problem. We discuss the capture and annotation of the AMI meeting corpus, the development of a m...
متن کاملInteraction-based Information Retrieval in Multimodal, Online, Artefact-Focused Meeting Recordings
Traditional search operations, topic detection or summarisation of meeting recordings are generally performed using segmentation and indexing techniques originally developed in the field of Multimedia Information Retrieval. However, many assumptions possible in a media production environment are often inadequate when applied to spontaneous meeting recordings, as features such as sudden changes ...
متن کامل